NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension

https://doi.org/10.18653/v1/2022.acl-long.357

Subramanian, Sanjay; Merrill, William; Darrell, Trevor; Gardner, Matt; Singh, Sameer; Rohrbach, Anna (January 2022, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain. While large-scale pre-trained models are useful for image classification across domains, it remains unclear if they can be applied in a zero-shot manner to more complex tasks like ReC. We present ReCLIP, a simple but strong zero-shot baseline that repurposes CLIP, a state-of-the-art large-scale model, for ReC. Motivated by the close connection between ReC and CLIP’s contrastive pre-training objective, the first component of ReCLIP is a region-scoring method that isolates object proposals via cropping and blurring, and passes them to CLIP. However, through controlled experiments on a synthetic dataset, we find that CLIP is largely incapable of performing spatial reasoning off-the-shelf. We reduce the gap between zero-shot baselines from prior work and supervised models by as much as 29% on RefCOCOg, and on RefGTA (video game imagery), ReCLIP’s relative improvement over supervised ReC models trained on real images is 8%.
more » « less
Full Text Available
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models

https://doi.org/10.18653/v1/D19-3002

Wallace, Eric; Tuyls, Jens; Wang, Junlin; Subramanian, Sanjay; Gardner, Matt; Singh, Sameer (October 2019, Conference on Empirical Methods in Natural Language Processing (EMNLP): System Demonstrations)

Neural NLP models are increasingly accurate but are imperfect and opaque—they break in counterintuitive ways and leave end users puzzled at their behavior. Model interpretation methods ameliorate this opacity by providing explanations for specific model predictions. Unfortunately, existing interpretation codebases make it difficult to apply these methods to new models and tasks, which hinders adoption for practitioners and burdens interpretability researchers. We introduce AllenNLP Interpret, a flexible framework for interpreting NLP models. The toolkit provides interpretation primitives (e.g., input gradients) for any AllenNLP model and task, a suite of built-in interpretation methods, and a library of front-end visualization components. We demonstrate the toolkit’s flexibility and utility by implementing live demos for five interpretation methods (e.g., saliency maps and adversarial attacks) on a variety of models and tasks (e.g., masked language modeling using BERT and reading comprehension using BiDAF). These demos, alongside our code and tutorials, are available at https://allennlp.org/interpret.
more » « less
Full Text Available
MedICaT: A Dataset of Medical Images, Captions, and Textual References

https://doi.org/10.18653/v1/2020.findings-emnlp.191

Subramanian, Sanjay; Wang, Lucy Lu; Bogin, Ben; Mehta, Sachin; van Zuylen, Madeleine; Parasa, Sravanthi; Singh, Sameer; Gardner, Matt; Hajishirzi, Hannaneh (January 2020, Findings of the Association for Computational Linguistics: EMNLP)

Understanding the relationship between figures and text is key to scientific document understanding. Medical figures in particular are quite complex, often consisting of several subfigures (75% of figures in our dataset), with detailed text describing their content. Previous work studying figures in scientific papers focused on classifying figure content rather than understanding how images relate to the text. To address challenges in figure retrieval and figure-to-text alignment, we introduce MedICaT, a dataset of medical images in context. MedICaT consists of 217K images from 131K open access biomedical papers, and includes captions, inline references for 74% of figures, and manually annotated subfigures and subcaptions for a subset of figures. Using MedICaT, we introduce the task of subfigure to subcaption alignment in compound figures and demonstrate the utility of inline references in image-text matching. Our data and code can be accessed at https://github.com/allenai/medicat.
more » « less
Full Text Available
Evaluating Models’ Local Decision Boundaries via Contrast Sets

https://doi.org/10.18653/v1/2020.findings-emnlp.117

Gardner, Matt; Artzi, Yoav; Basmov, Victoria; Berant, Jonathan; Bogin, Ben; Chen, Sihao; Dasigi, Pradeep; Dua, Dheeru; Elazar, Yanai; Gottumukkala, Ananth; et al (January 2020, Findings of Empirical Methods in Natural Language Processing)
null (Ed.)
Full Text Available

Search for: All records